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The authors are to be congratulated for their deep 
appreciation of Jeffreys's famous book, Theory of 
Probability, and their very impressive, knowledge- 
able consideration of its contents, chapter by chap- 
ter. Many will benefit from their analyses of topics 
in Jeffreys's book. As they state in their abstract, 
"Our major aim here is to help modern readers in 
navigating this difficult text and in concentrating on 
passages that are still relevant today." From what 
follows, it might have been more accurate to use 
the phrase, "modern well-informed Bayesian statis- 
ticians" rather than "modern readers" since the au- 
thors' discussions assume a rather advanced knowl- 
edge of modern Bayesian statistics. Readers who are 
"just" physicists, chemists, philosophers of science, 
economists, etc., may have great difficulty in un- 
derstanding the authors' guide to Jeffreys's book. 
This is unfortunate since the book provides meth- 
ods and philosophical principles relevant for all the 
sciences. Perhaps in the future, additional reviews 
of Jeffreys's book will be prepared that are under- 
standable to a broader range of readers, as was done 
in having scientists and scholars from many fields 
discuss at length Jeffreys's and others' thoughts on 
simplicity and complexity at a conference and re- 
ported in Zellner, Kuezenkamp and McAleer (2001). 

Another point that affects the authors' discussion 
is their apparent misinterpretation of the title of 
Jeffreys's book. They write, "The title itself is mis- 
leading in that there is no exposition of the math- 
ematical bases of probability theory in the sense of 
Bilhngsley (1986) and Feller (1997)." In this regard, 
years ago Lord Rutherford, a famous physical scien- 
tist, said that if you need statistics to analyze your 
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data, you better redesign your experiment, and as a 
result the word "statistics" was not highly regarded 
in the physical sciences and the term "probability 
theory" was employed by Jeffreys, Jaynes (2003) 
and many other physical scientists to include ap- 
plied and theoretical statistics, mathematical meth- 
ods, including elements of formal probability the- 
ory and philosophical aspects of science. With their 
narrow interpretation of Jeffi'eys's title, the authors 
found many discussions in the book to be "irrel- 
evant," whereas Jeffreys considered them to be of 
fundamental importance and did not want to have 
his book limited to just mathematical topics, as in 
his and his wife's very famous book. Mathematical 
Methods of Physics. And indeed. Good [(1980), page 
32] wrote, "In summary, Jeffreys's pioneering work 
on neo-Bayesian methods. . . was stimulated by his 
interest in philosophy, mathematics, and physics, 
and has had a large permanent influence on sta- 
tistical logic and techniques. In my review Good 
(1962) I said that Jeffreys's book on probability "is 
of greater importance for the philosophy of science, 
and obviously of greater immediate practical impor- 
tance, than nearly all the books on probability writ- 
ten by professional philosophers lumped together." 
I believe this is still true, though more professional 
philosophers have woken up." 

With respect to the discussion of Chapter 1, read- 
ers will wonder what the authors mean by terms like 
"subjective," "objective," "objective priors" and "gen- 
uine prior information." Contrary to what the au- 
thors state, Jeffreys did adjust his "objective priors" 
(1) to get a "reasonable" amount of invariance, (2) 
to get "reasonable" results in the Laplace rule of 
succession, binomial problem and (3) to correct for 
"selection results" in testing many alternative mod- 
els with large sets of data. Thus he was not always 
an "objective" Bayesian but rather a very thought- 
ful Bayesian who recognized needs for better pro- 
cedures for certain problems and provided them in 
many cases. Perhaps he should be called a "prag- 
matic" Bayesian. 

Most important in Chapter 1 is Jeffreys's axiom 
system for learning from data and experience that 
is applicable to research in all fields of science. He 
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considered deduction and induction at great length 
in a most interesting productive manner and the au- 
thors provide interesting and useful comments. How- 
ever, the authors' introduction of decision theoretic 
considerations solution in discussion of point 1 
fails to recognize that the decision theoretic solution 
based on limited data, though "optimal" may not 
be very good because of the limited data employed. 
Good science requires testing models' explanatory 
and predictive performance using much data in or- 
der to ascertain the validity of a particular theory, 
say Einstein's theory and along the way in testing 
many variants of the original model will probably 
be considered. And finally, one has to specify a loss 
or utility function. . . whose loss function? Errors in 
formulating loss or utility functions can vitally af- 
fect the quality of "optimal" decisions, as is well 
known. And to suggest that the debate about model 
choice was not present in Jeffreys's time overlooks 
the well-known debates that raged about Newton's 
"laws" versus Einstein's "laws" and the adequacy of 
quantum theory, etc., during the early 20th century 
and beyond, about which Jeffreys was fully aware. 
Further, the authors' statement in point 8 about 
grounding Theory of Probability within mathemat- 
ics fails to note that Jeffreys recognized that there is 
controversy about the foundations of mathematics. 
Still he pragmatically adopted point 8. 

Most important in Chapter 1 are Jeffreys's com- 
ments on his dissatisfaction with the standard proof 
of the product rule of probability that is used to 
derive Bayes' theorem that led him to introduce 
the product rule of probability as an axiom, rather 
than a theorem in his system, as the authors note 
in their discussion of the derivation of Bayes' the- 
orem. Jeffreys noted that the assumption that the 
elements of the sets A, B and the intersection of 
A and B are equally likely to be drawn, all hav- 
ing a probability equal to 1/n, where n is the total 
number of elements, will not be satisfied in many 
cases. After stating that he was unable to prove the 
product rule without this assumption, he pragmat- 
ically introduced the result as Axiom 7 on page 25. 
Since many, including myself, worried about this ba- 
sic point, I was happy to discover that the proof of 
the product rule could be generalized by going to 
a hierarchical model with the probabilities for ele- 
ments of the sets assumed to have properties that 
produced the usual product rule of probability; see 
Zellner (2007). Further, earlier in my concern about 



valid proofs or derivations of Bayes' theorem, I ap- 
proached the problem as an engineer might by con- 
sidering the informational inputs, namely the infor- 
mation in a prior density and in a likelihood func- 
tion, and the output information, the information in 
a posterior density for the parameters and a marginal 
density for the observations. On using Shannon's 
measure of information, it is possible to form an ex- 
pression, output information minus input informa- 
tion and to minimize it with respect to the choice 
of the form of the output or posterior density for 
the parameters. The solution is to take the posterior 
density equal to the prior density times the likeli- 
hood function divided by the marginal density of the 
observations, which is precisely the result yielded 
by Bayes' theorem. Also, when this solution is em- 
ployed, it is the case that the output information 
equals the input information and thus the procedure 
is 100% efficient. See Zehner (1988) for the detailed 
results and commentary on them by E. T. Jaynes, B. 
M. Hill, S. Kullback and J. Bernardo, all reprinted 
in Zellner (1997b) along with solutions to variants of 
the above problem. For example, in some problems 
we may not have an input prior but just an input 
likelihood function. Then the solution to the min- 
imization problem is to take the posterior density 
proportional to likelihood function, a 100% efficient 
solution that happens to be exactly the fiducial in- 
ference procedure suggested by R. A. Fisher who, 
as Jeffreys and others pointed out, did not have a 
theoretical justification for it. Also, other optimal in- 
formation processing results are presented that take 
account of the varying quality of input information, 
temporal relations of the inputs from one period to 
the output of the next period, etc. In effect, we now 
have a number of optimal learning models, not just 
one, Bayes' theorem, to use in learning from data 
and experience. Given that Jeffreys was deeply con- 
cerned about how to justify Bayes' theorem and how 
to learn effectively from data and experience, I hope 
that he likes these results that flowed from his con- 
cern about the validity and applicability of proofs of 
Bayes' theorem. 

With respect to the authors' comments on prior 
densities, in particular non-informative priors, they 
very thoughtfully review Jeffreys's innovative pro- 
cedure for producing non-informative priors with 
many critical remarks regarding his use and mis- 
use of unbounded measures. As regards a prior for 
the binomial parameter p, which can take on values 
in the closed interval zero to 1, the authors consider 
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Laplace's uniform prior, Haldane's prior and the Jef- 
freys's prior followed by a thoughtful discussion of 
the famous Laplace Rule of Succession for analysis 
of which Jeffreys suggests putting lumps of proba- 
bility on the values zero and one and spreading out 
the remaining probability mass uniformly, zero to 
one in order to get "reasonable" results for Laplace's 
problem: given n independent dichotomous, bino- 
mial trials and observing n "successes" in n trials, 
what is the probability of a success on the next try? 
Jeffreys expressed his view that his non-informative 
prior and Haldane's prior that are symmetric around 
a half and go to infinity at p = and p = 1 put 
too much mass in the neighborhoods of the extreme 
points, and 1, while the Laplace uniform prior 
does not put enough mass in the neighborhood of 
the extreme points, again very pragmatically sug- 
gests a modified Laplace mixed prior density. As I 
have pointed out in Zellner [(1997b), page 117], my 
"maximal data information prior" (MDIP) for the 
binomial parameter p, in the closed interval to 1 
is 

f{p) = lM86xpPx{l 

a density that is symmetric about p = ^ , its minimal 
value, and rises to 1.6186 at both p = and p= 1. 
It is thus "between" the uniform and Jeffreys's and 
Haldane's priors that shoot off to infinity at the end 
points. Further, a similar result is available for the 
multinomial model's parameters. Also, the criterion 
functional that is optimized to produce this MDIP 
for the binomial parameter and many others is an 
information criterion functional (see Zellner, 1997b, 
page 128ff for details) and uses of it to produce 
priors for many models and problems that are in 
general invariant to linear transformations and can 
be made invariant to other relevant transformations 
and related to work by Jeffreys, Berger, Bernardo 
and others on this difficult problem. 

Further, in the case of a prior for a correlation co- 
efficient in a normal model, the authors present an 
"arc-sine" prior for the correlation coefficient that 
is exactly the MDIP for this parameter. Also, the 
MDIP approach has been applied to the AR(1) sta- 
tionary process, a problem discussed in the current 
paper and by Jeffreys. As explained in Zellner [(1997b), 
page 138], the MDIP for this problem is p{b,a) = 
c(l — 6^)^/^/(7, with —1 <b < 1. This contrasts mark- 
edly with the Jeffreys prior p{b, a) = c^/(l — b'^Y^'^a 
that the authors present without noting that Jef- 
freys [(1967), page 359] states, "The [Jeffreys] es- 
timate rule gives a singularity at not only 6 = 1, 



which might be tolerable, but also at 6 = — 1, which 
is not." Thus Jeffreys, always honest and pragmatic, 
reports that his prior for this problem is intolerable. 
See also the MDIP prior for parameters of a station- 
ary AR(2) process and many other models in Zellner 
(1997a). Given the remarkable properties of MDIPs 
and the general principle from which they are de- 
rived, it is indeed surprising that the authors make 
no mention of them. 

In closing, I shall quote the conclusions regard- 
ing Jeffreys's research contributions made by a lead- 
ing Bayesian statistician to provide readers with an 
alternative appreciation of Jeffreys's contributions 
that can be compared to that presented in the au- 
thors' paper. Seymour Geisser (1980) wrote: 

If one were to present a short selected 
summary of Jeffreys's contributions to 
Bayesian inference, I believe that the fol- 
lowing would be on everybody's list. 

(1) He made the inductive argument a 
"logical" one within the context of a Bayes- 
ian framework and maintained it could 
only be so within this framework. 

(2) He made a valiant attempt to quan- 
tify lack of knowledge by giving rather 
clever canonical rules and conventions but 
was not constrained to think only in these 
terms. 

(3) He produced a normative catalog of 
cogently reasoned Bayesian solutions to 
many conventional statistical paradigms. 

(4) He introduced and developed invari- 
ance considerations into the Bayesian sys- 
tem. 

(5) His devastating critiques of the vari- 
ous frequency theories propounded by 
Venn, Fisher, Neymann and others were, 
in the words of de Finetti (1970), closely 
argued and unanswerable. 

In summary, Jeffreys's approach amalga- 
mated a Bayesian system with two primi- 
tive data principles reflective of public sci- 
entific work: (1) letting the data speak 
for themselves and (2) the actual units 
in which you choose to express your work 
should by and large not affect the infer- 
ence. This is translated into so-called non- 
informative priors and invariance under 
suitable transformations. It was a rather 
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remarkable conception, brilliantly executed, 
whose ultimate test is how it works in 
practice (19-20). 

Thanks again to the authors for their many in- 
sightful comments that are very relevant for apprais- 
ing Jeffreys's technical work and its mathematical 
basis. In this connection, some years ago I asked 
the famous statistician David Cox why the British 
have been so successful in the field of Statistics. He 
replied that British statisticians were well trained in 
applied mathematics, not theoretical mathematics. 
Perhaps this explains Jeffreys's limited knowledge 
of past measure theory and ignorance of recent re- 
sults on alternative limiting processes for defining 
unbounded measures that have appeared since his 
death in 1989. 
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